Applying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval at NTCIR-4
نویسندگان
چکیده
Our information retrieval system takes advantage of numerous characteristics of the information and applies numerous sophisticated techniques. Robertson’s 2-Poisson model and Rocchio’s formula, both of which are known to be effective, have been applied in the system. Characteristics of newspapers such as locational information were applied. We present our application of Fujita’s method, where longer terms are used in retrieval by the system but de-emphasized relative to the emphasis on the shortest terms; this allows us to use both compound and single-word terms. The statistical test used in expanding queries through an automatic feedback process is described. The method gives us terms which have been statistically confirmed to be related to the top-ranked documents that were obtained in the first retrieval. We also used a numerical term QIDF, which is an IDF term for queries. It has a function to decrease the scores for stop words that occur in many queries. It can be very useful for foreign languages for which we cannot examine stop words. We participated in three tasks (Korean, Japanese, and English) of monolingual information retrieval at NTCIR 4. We obtained relatively higher precisions in all the tasks in which we participated. In particular, we obtained the best precision in Korean description-based monolingual information retrieval.
منابع مشابه
Applying Multiple Characteristics and Techniques to Obtain High Levels of Performance in Information Retrieval
Our information retrieval system which achieves its goals by taking advantage of numerous characteristics of the information and applying numerous sophisticated techniques is described. Robertson’s 2Poisson model and Rocchio’s formula, both of which are known to be effective, have been applied in the system. Characteristics of newspapers such as locational information were applied. We give exam...
متن کاملApplying Multiple Characteristics and Techniques in the NICT Information Retrieval System at NTCIR-6
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. It uses Robertson’s 2-Poisson model and Rocchio’s formula, both of which are known to be effective. Characteristics of newspapers such as locational information are used. We present our application of Fujita’s method, where longer terms are used in retrieval by...
متن کاملExperiments on Chinese-English Cross-language Retrieval at NTCIR-4
The AI Lab group participated in the crosslanguage retrieval task at NTCIR-4. Aiming at a practical retrieval system, our applied a dictionarybased approach incorporated with phrasal translation, co-occurrence disambiguation and query expansion techniques. Although experimental results were not as good as we expected, our study demonstrated the feasibility of applying CLIR techniques in real-wo...
متن کاملApplying Multiple Characteristics and Techniques in the NICT Information Retrieval System in NTCIR-5
Our information retrieval system takes advantage of numerous characteristics of information and uses numerous sophisticated techniques. Robertson’s 2Poisson model and Rocchio’s formula, both of which are known to be effective, are used in the system. Characteristics of newspapers such as locational information are used. We present our application of Fujita’s method, where longer terms are used ...
متن کاملToshiba ASKMi at NTCIR-4 QAC2
Toshiba participated in NTCIR-4 QAC2 Subtask 1: this is our first QAC participation. Using our newly developed Japanese QA system called ASKMi, we submitted two runs, one using Okapi/BM25 for document retrieval (TSB-A) and the other using Boolean AND constraints before applying Okapi/BM25 (TSB-B). We achieved the 5th best performance among the 17 participants (8th and 9th among the 25 submitted...
متن کامل